This report provides an analysis of PM2.5 air quality in San Mateo County using sensor data from PurpleAir. The analysis is carried out on 3 fronts: (i) Geographic Distribution of Air Quality between different county jurisdictions; (ii) Equity Analysis to identify disproportionate burden of air pollution between different income and racial demographics; and (iii) Amalgating the collected data to determine a predicted scoring system to advise on the next best location for a sensor to be located based on user-defined weightings
To gather more information on the geographic distribution of sensors across San Mateo County, data on the location where Purple Air’s sensors was extracted and the corresponding air quality indices (AQI) ranges from these sensors plotted in the following map:
Next, air quality data for the month of February was collated and compiled from 6 different jurisdictions in San Mateo County comprising Redwood City, Menlo Park, Burlingame, San Bruno, San Carlos and San Mateo. “Voronoi polygons” were created to interpolate air quality information for specific census block groups using the st_voronoi function.
The resulting plot of Voronoi polygons in San Mateo County can be found in the following:
Using this information, a database of PM2.5 air quality trends in February for the 6 different jurisdictions could be plotted. For ease of visualization, the trend information and geographic location for these jurisdictions have been plotted in the following dashboard so that users can view and interpret these results more readily:
https://alexngld.shinyapps.io/AlexanderNg_A5_DB1/
In general, air quality across all these regions were relatively uniform in the month of February with the exception of Redwood City which has higher mean PM2.5 levels.
Next, equity analyses were carried out to compare air quality exposure between different income groups and racial demographics. In these analyses, ACS data was used to extract income levels at the CBG level while Decennial data was used to determine racial demographics at the block level. An equity analysis was then carried out based on the last week of average air quality data through Purple Air.
The same SMC sensor data from Part 1 was used, with the major differences being filtered to select “inside” sensors as well as including only data from last week of February.
Income distributions for San Mateo County was extracted using ACS 2019 data Group B19001. Income information was combined with the PM2.5 data through a left_join function using the respective CBG and GEOID fields and segregated accordingly based on PM2.5 Quartile ranges.
Racial demographics for San Mateo County was extracted through Decennial 2020 data Group P1. Race categories was combined with PM2.5 data through a left_join function using the respective block and GEOID10 fields and segregated accordingly based on the respective PM2.5 scores.
The resulting income and racial equity analyses for San Mateo County can be found in the following Dashboard, for users to toggle between Race and Income analsis accordingly:
https://alexngld.shinyapps.io/AlexanderNg_A5_DB2/
On racial equity, households identifying as “Asian Alone” are more likely to experience higher PM2.5 levels whereas the distribution of households across different income tiers are relatively even across the respective PM2.5 quartile ranges.
This segment of the report focuses on designing a scoring system to predict the disproportionate availability of air quality information. The scoring metric focuses of racial distribution by race, and the user can define the proposed weights accorded to each different racial demographic.
The following maps outlines the geographic coverage of each sensor in the respective CBGs. Through this, we can determine CBGs where there are multiple sensors providing overlapping coverage.
Next, racial demographics from Decennial data was used to calculate the relative racial population density in each CBG based on the geographic size per square mile and racial mix. Weights ranging between 0 to 1 could be assigned to different racial demographics to provide a proposed scoring based on on sensor coverage by multiplying the weighting against the racial density. Accordingly, a new proposed sensor location would be assigned to the CBG that has the lowest score based on the assigned weight distribution. This scoring dashboard is found in the following link:
The techniques highlighted in this report demonstrate the strong potential for translating data into readily understood means using a combination of data analytics and dashboard visualization. For instance, the design of an interactive feature to propose a predictive PurpleAir sensor-locating scoring system is a useful application for facilitating more evidence-based decision-making to inform better policy. Using tools such as these makes it easier to bring different stakeholders into the conversation to refine the boundaries and parameters used for analysis, and develop resources that are relevant and bespoke according to the specific needs and requirements of the target audience.